# Data

This folder includes data used for research on "Missing Data, Speculative Reading" article.

## Source data

Copies of published Shakespeare and Company Project dataset files are included for convenience.

Current versions should be obtained from the Project site, and should be cited as listed there:

https://shakespeareandco.princeton.edu/about/data/


## Research data

Data files in this folder generated as part of the research for this article or data not published elsewhere.

### Book acquisition catalog

- beach_lendinglibrary_catalog.csv

This data is a set of a spreadsheet of acquisitions compiled by Robert Chiossi for the Project from 
an inventory from the Sylvia Beach papers.

“Inventories, Order Records, Clients; Sylvia Beach Papers, C0108,” (n.d.), Manuscripts Division, Department of Special Collections, Princeton University Library, findingaids.princeton.edu/catalog/C0108_c02205.

### Partial borrowers

Members with extant but incomplete borrowing records. CSV files list these members and their subscriptions without documented borrowing activity. The collapsed version consolidates sequential or near-sequential subscriptions.

The files were generated by [identify_partial_borrowers.py](../speculative_reading/identify_partial_borrowers.py)

- partial_borrowers.csv
- partial_borrowers_collapsed.csv


### Long-borrow overrides

In the course of our research, we discovered long-duration borrow events (duration longer than a year) that had been incorrectly entered; these errors are present in the v1.2 datasets but corrections have been submitted to the Shakespeare and Company Project. Since these impact our estimates, we include a list overrides and a mechanism for applying them.

- long_borrow_overrides.csv

#### Incorporating long borrow corrections

The long borrow corrections are meant to be used with the 1.2 version of the dataset. They can be incorporated like this:

```python
events_df = pd.read_csv("SCoData_events_v1.2_2022-01.csv")
borrow_overrides = pd.read_csv("long_borrow_overrides.csv")

events_df = pd.read_csv("SCoData_events_v1.2_2022-01.csv")
borrow_overrides = pd.read_csv("long_borrow_overrides.csv")


for borrow in borrow_overrides.itertuples():
    member_item_borrows = events_df[
        (events_df.event_type == "Borrow")
        & (events_df.member_uris == borrow.member_uris)
        & (events_df.item_uri == borrow.item_uri)
    ]
    if borrow.match_date == "start_date":
        # get the *index* of the row to update
        update_index = member_item_borrows.index[
            member_item_borrows.start_date == borrow.start_date
        ]
    elif borrow.match_date == "end_date":
        update_index = member_item_borrows.index[
            member_item_borrows.end_date == borrow.end_date
        ]

    # update with correct dates & borrow duration
    events_df.at[update_index, "start_date"] = borrow.start_date
    events_df.at[update_index, "end_date"] = borrow.end_date
    events_df.at[
        update_index, "borrow_duration_days"
    ] = borrow.borrow_duration_days
```